<!DOCTYPE html>
<html lang="en">

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Improving GCC's Interprocedural Optimization Infrastructure</title>
<link rel="stylesheet" type="text/css" href="https://gcc.gnu.org/gcc.css" />
</head>
<body>
<h1>Improving GCC's Interprocedural Optimizaion Infrastructure</h1>

<p>This page describes ongoing work to improve GCC's infrastructure
for tree-based interprocedural optimizers. The work is done on a
branch in GCC's CVS repository called <code>tree-profiling-branch</code>.</p>

<p>The short-term goals of the branch are to bring profiling heuristics
and profile feedback information to the tree level, to add support for
managing multiple control flow graphs, and to implement profile guided
function inlining.  At the heart of this project is the call graph code
added for GCC 3.4, and the effort to maintain the control flow graph and
profile information over expanding from GIMPLE, which is the tree-based
intermediate representation, to RTL, the intermediate representation of
GCC's back ends.</p>

<p>Longer-term goals include adding support for doing some optimizations
(such as CCP and DCE) before inlining, and implementing a framework for
other interprocedural optimizations, such as interprocedural constant
propagation and side-effects analysis.</p>

<p>As the patches stabilize, they will be submitted for review and
acceptance into the mainline development tree. Please contact <a
href="mailto:jh@suse.cz">Jan Hubicka &lt;jh@suse.cz&gt;</a> if you
would like to contribute.</p>

<h2>Bringing profile information to the tree optimization framework</h2>

<h3>Rationale</h3>

<p>When the compiler knows where optimizations are beneficial, it can
do a better job at generating good code for an application.  GCC has a
framework for profile-guided optimizations, but the profile is not
available early enough for several optimizations that could benefit
from it.  Specifically, function inlining, code generation for multiway
branches, various loop transformations, and value histogram based
optimizations are done, or could be done better, in the new Tree SSA
framework where profile information is not available.  Making it
available to these passes would allow GCC to generate better code.</p>

<h3>Current situation in GCC</h3>

<p>As of June 2004, GCC does support profile feedback and branch
prediction and several algorithms use the profile to improve performance.
The compiler can obtain profile information by trying to predict all
branches appearing in the intermediate representation for the application
during the compilation process, or from profile feedback collected during
test runs of the application.</p>

<p>Unfortunately, the profile is not available until very late in the
compilation process.  To understand why, you need to understand what
profile information is to the compiler, and how it is represented.</p>

<p>While the true execution profile is a program property, for the
compiler it is a control flow graph annotation.  Therefore, the profile
fed back from test runs is in reality the output of the application
instrumented with code on control flow graph edges and basic blocks to
write out execution counts.  This means that the compiler must read the
fed back profile at the same point in the compilation process where the
control flow graph was previously instrumented for the test runs,
because otherwise the profile data may not match with the control flow
graph for the intermediate representation at that point in the
compilation process.</p>

<p>After instrumenting or reading, the compiler has to update the profile
for each modification it does to the control flow graph.  So it has to
instrument the application for profiling somewhere between the first
point in the compilation process from where the CFG will be kept up
to date, and the passes for which having profile information available
is the most profitable.</p>

<p>In GCC, this means that profile information cannot be made available
to passes that run early in the compilation process.  The old loop passes
do not preserve the control flow graph, and when the CFG is destroyed,
any profile information available at that point is also destroyed.
The passes which benefit the most from profile information available,
such as scheduling and block reordering,all have to run after the loop
optimizer.  So in GCC we have to instrument the control flow graph for
for profiling between running the old loop optimizer and the final
scheduling pass.</p>

<p>To make things worse, the control flow graph is also destroyed before
the GIMPLE tree is expanded to RTL, and a profiling infrastructure for
trees is not available.  So simply replacing the old loop is not enough
to bring profile information to the Tree SSA passes.  GCC needs a way to
preserve profile information when expanding from trees to RTL.</p>

<h3>Status on the branch</h3>

<p>The old loop optimizer has been disabled in anticipation of its
replacement by the rewritten loop optimizers from the
<code>cfg-branch</code> and the <code>lno-branch</code>.
This means that all RTL passes now maintain the CFG from the start of
<code>rest_of_compilation</code> all the way through the reorg pass.
Also, GCC now knows how to maintain the control flow graph when expanding
from GIMPLE to RTL.  The CFG-aware expand code can be found in the new
file <code>cfgexpand.c</code>.  Code to instrument the GIMPLE tree for
profiling is being worked on also.</p>


<h2>Profile guided inlining</h2>

<h3>Rationale</h3>

<p>Judicious inlining of functions is important for good performance.
Inlining avoids function call overhead (register saves and restores,
parameter passing ABI (using specific registers, setting up the stack),
the call itself) and exposes larger functions to the optimizers for
for deeper enhancements.  However, inlining may cause exponential code
growth.  Therefore, inlining of nontrivial functions called from rarely
executed basic blocks is almost always a loss.</p>

<p>This implies that the compiler should have some means to estimate the
profitability of inlining using information extracted from an execution
profile.  The profile may be estimated from branch prediction heuristics,
or obtained using trial runs and fed back to the compiler.  Profiling can
reveal details about the actual usage of functions in the application to
direct the compiler to inline better and more effectively.</p>

<h3>Current situation in GCC</h3>

<p>Even if a profile was available, the current inliner would destroy it
for inlined functions because profile information is inherently associated
with the control flow graph, which is also not built until after inlining.
So a new inliner is needed that can inline function control flow graphs
instead of functions as trees.  The compiler will have to construct and
maintain a CFG for every function, instead of only for the function that
is currently being compiled.</p>

<h3>Status on the branch</h3>

<p>Support for multiple CFGs has recently been committed, and work is under
way on a new inliner that inlines a function's CFG instead of a GIMPLE
tree.  Our goal is to have this work ready in time for inclusion in the
next release of GCC.</p>


<h2>Framework for Interprocedural Analysis and Optimizations</h2>

<h3>Rationale</h3>

<p>Function calls hide information about side effects from the optimizers.
By inlining a function into a caller, all such information is made
available to the caller, but at the cost of increased code size and
degraded instruction locality.  Also it is not always possible to inline
a function, for example when it has unpredictable side effects (e.g. it
calls setjmp or it has non-local gotos).</p>

<p>Interprocedural data flow analyses (IPA) attempt to provide a different
means to overcome the information-hiding problem of function calls.  They
typically analyse the intraprocedural properties of a function,
and propagate that information across the call graph.  The compiler can
then use that information to improve intraprocedural information, such as
alias and dependency information, and generate better code for any single
function.  It can also choose to specialize a function, to remove paths
from a function that will never be taken, or to inline functions for which
intraprocedural analysis can not prove that inlining is profitable.</p>

<p>Experience from other compilers shows that interprocedural constant
propagation and interprocedural side-effects analysis may help
disambiguate memory dependencies, and that function specialization by
cloning can significantly reduce the cost of function calls in highly
abstracted applications (such as C++ applications using the STL).  For
GCC we believe that implementing these interprocedural analyses would
make it possible to greatly improve the generated code in many
applications.</p>

<h3>Current situation in GCC</h3>

<p>At present, function inlining is the only interprocedural optimization
implemented in GCC.  Until recently, a framework for IPA was lacking, and
the intermediate representation was too low-level to allow interprocedural
analyses to be effective.</p>

<p>With the contribution of code to construct and analyze the call graph,
and with the new optimization framework and higher level intermediate
representations from the Tree SSA project, an IPA framework is within
reach.  The major remaining technical obstacles for IPA are inefficient
data structures for the <code>tree</code> intermediate representations,
which may cause memory consumption to rise excessively when used as-is in
an IPA framework.  Other obstacles of non-technical nature also make it
it impossible at present to implement across-file interprocedural
optimizations other than those provided by the intermodule framework,
which also suffers from the inefficient structure of the
<code>tree</code> representations.</p>

<h3>Status on the branch</h3>

<p>At present, no interprocedural work is being done on the branch, but
a patch for interprocedural constant propagation is available and will
probably be included on the branch soonish.  Otherwise, plans exist for
reorganisation of the compiler passes as follows:</p>

<ol>
<li>For each SCC in the call graph in reverse topological order<br />
    <ul>
    <li>do simple intraprocedural optimizations in SSA form (basically,
	run anything that is cheap and can not create overlapping live
	ranges, but will improve the results of the next step);</li>
    <li>collect intraprocedural information necessary for
	interprocedural data flow analyses;</li>
    <li>store the function's intermediate representation somewhere.</li>
    </ul></li>
<li>Perform interprocedural analyses and optimizations (at least constant
    propagation, side-effects analysis, and function cloning and
    specialization).</li>
<li>For each SCC in the call graph in reverse topological order
    <ul>
    <li>do remaining intraprocedural optimizations;</li>
    <li>hand over the optimized function to the RTL expanders for
	finalization and code generation.</li>
    </ul></li>
</ol>

</body>
</html>
